Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique
نویسندگان
چکیده
In this research work, we propose to identify an imaginary line called baseline threading through the entire stretch of text-line, with reference to which the location of vertical extents of Persian characters could be accurately interpreted. Depending upon the curvedness of the handwritten Persian text-line the baseline also would be curved. In this research a novel piece-wise painting scheme is proposed to prepare patches of black and white blocks all along the textline, identify some candidate points, regress a curve through these candidate points to trace the baseline which is subsequently stretched straight horizontally and subsequently we de-tilt the characters to align the text-line with the horizontal imaginary baseline properly. The proposed algorithm is evaluated with 108 Persian handwritten text-lines containing 3612 subwords. Experimental analysis showed that 91.2% of the subwords are accurately aligned. Further, the proposed scheme is tested with another dataset containing 600 text-lines [13] and more accurate results are achieved when compared with the results reported in state of the art for the same dataset. The effectiveness of aligning text-lines linearly is demonstrated through OCRing for readability of tilted printed English text-lines and corresponding transformed text-lines, which are obtained using the proposed procedure.
منابع مشابه
A new scheme for unconstrained handwritten text-line segmentation
Variations in inter-line gaps and skewed or curled text-lines are some of the challenging issues in segmentation of handwritten text-lines. Moreover, overlapping and touching text-lines that frequently appear in unconstrained handwritten text documents significantly increase segmentation complexities. In this paper, we propose a novel approach for unconstrained handwritten text-line segmentatio...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملRegion growing based segmentation algorithm for typewritten and handwritten text recognition
This paper presents a new technique of high accuracy to recognize both typewritten and handwritten English and Arabic texts without thinning. After segmenting the text into lines (horizontal segmentation) and the lines into words, it separates the word into its letters. Separating a text line (row) into words and a word into letters is performed by using the region growing technique (implicit s...
متن کاملروشی جدید جهت استخراج موجودیتهای اسمی در عربی کلاسیک
In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...
متن کامل